library(tidyverse)
Warning: package ‘tidyverse’ was built under R version 4.0.5
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages ----------------------------------------------------------------------- tidyverse 1.3.1 --
v ggplot2 3.3.3 v purrr 0.3.4
v tibble 3.1.0 v dplyr 1.0.5
v tidyr 1.1.3 v stringr 1.4.0
v readr 1.4.0 v forcats 0.5.1
Warning: package ‘ggplot2’ was built under R version 4.0.4
Warning: package ‘tibble’ was built under R version 4.0.5
Warning: package ‘tidyr’ was built under R version 4.0.4
Warning: package ‘dplyr’ was built under R version 4.0.4
Warning: package ‘forcats’ was built under R version 4.0.4
-- Conflicts -------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(janitor) # for data cleaning
Warning: package ‘janitor’ was built under R version 4.0.4
Attaching package: ‘janitor’
The following objects are masked from ‘package:stats’:
chisq.test, fisher.test
library(readxl) # for readig excel files
library(visdat) # for a quick look at data quality
Warning: package ‘visdat’ was built under R version 4.0.4
Dataset downloaded from ONS on 5th June 2021. Both gardens space and public parks datasets are the April 2020 versions.
# get a quick visual summary of data types and the amount of missing data
gardens %>%
visdat::vis_dat()
gardens %>%
visdat::vis_miss()
ONS released the 2019 IMD data at LSOA scale. MySociety have produced IMD at various other scales including the MSOA scale.
imd_2019 <- read_csv("data/imd2019_msoa_level_data.csv")
-- Column specification ----------------------------------------------------------------------------------------
cols(
MSOAC = col_character(),
MSOAHOCLN = col_character(),
LAD19C = col_character(),
LAD19N = col_character(),
REG = col_character(),
LSOACOUNT = col_double(),
POPMID15 = col_double(),
`IMD19 SCORE` = col_double(),
MSOARANK = col_double(),
MSOADECILE = col_double(),
MSOAQUINTILE = col_double()
)
imd_2019
ONS provide MSOA level population data. Here I use the most recent release (mid 2019).
msoa_pops <- read_xlsx(
"data/SAPE22DT4-mid-2019-msoa-syoa-estimates-unformatted.xlsx",
sheet = "Mid-2019 Persons",
skip = 4) %>%
# process variable names for conistency
janitor::clean_names() %>%
# select minimal number of columns
select(msoa_code, population = all_ages)
msoa_pops %>%
visdat::vis_miss()
Overall average = 2.4 (2020 see here)
2017-18 was the most year I could find on average occupancy figures broken down by house/flat from MCHLG:
House = 2.5 people
High-rise flat = 1.9
Mid-rise flat = 1.8
More details here.
So for now, I’ll make a conservative assumption that the average occupancy for a flat is 1.8 (i.e. the lower of the mid and high rise figures above).
I looked at trying to estimate more accurately, but the figures for the number of dwellings per block are not available.
2020 estimates of the numbers of high and mid rise flat can be found here. These can be used to calculate an average flat occupancy rate.
12,500 blocks of high rise flats
77,500 blocks of mid rise flats
# calculate scaling factors for ave occupancy in an MSOA
# from national data
nat_ave_occ <- 2.4
nat_house_ave_occ <- 2.5
nat_flat_ave_occ <- 1.9
flat_scale <- nat_flat_ave_occ / nat_ave_occ
house_scale <- nat_house_ave_occ / nat_ave_occ
people_wo_gar <- gardens %>%
left_join(msoa_pops) %>%
mutate(house_ad_without_gar_count = houses_ad_count - houses_ad_with_gar_count,
flats_ad_without_gar_count = flats_ad_count - flats_ad_with_gar_count,
ave_occ = population / ad_count,
ave_occ_flat = flat_scale * ave_occ,
ave_occ_house = house_scale * ave_occ,
pop_calc = (ave_occ_flat * flats_ad_count) + (ave_occ_house * houses_ad_count),
pop_diff = population - pop_calc)
Joining, by = "msoa_code"
ggplot(people_wo_gar, aes(pop_diff)) +
geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1282 rows containing non-finite values (stat_bin).
NA
May be more accurate to just use average occupancy rates for each MSOA??
people_wo_gar <- gardens %>%
left_join(msoa_pops) %>%
mutate(ad_wo_gar = ad_count * (1 - perc_of_ades_with_gar),
ave_occ = population / ad_count,
people_wo_gar = round(ad_wo_gar * ave_occ)) %>%
select(country_code:msoa_name, people_wo_gar)
Joining, by = "msoa_code"
people_wo_gar
ggplot(people_wo_gar, aes(people_wo_gar)) +
geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 1282 rows containing non-finite values (stat_bin).
https://www.robert-hickman.eu/post/getis-ord-heatmaps-tutorial/
library(sf)
# focus on Englnad and Wales
# (as shapefile for MSOAs only inlcudes England and Wales)
people_wo_gar_EW <- people_wo_gar %>%
filter(country_name == "England" |
country_name == "Wales")
median_msoa_peop_wo_gar = median(people_wo_gar_EW$people_wo_gar, na.rm = TRUE)
# read in MSOA boundary shapefile
msoa_bound <- st_read("data/MSOA_bound/Middle_Layer_Super_Output_Areas_(December_2011)_Boundaries.shp") %>%
select(msoa_code = msoa11cd,
msoa_name = msoa11nm)
Reading layer `Middle_Layer_Super_Output_Areas_(December_2011)_Boundaries' from data source `C:\Users\chris\Desktop\Data Analysis Projects\green_space\data\MSOA_bound\Middle_Layer_Super_Output_Areas_(December_2011)_Boundaries.shp' using driver `ESRI Shapefile'
Simple feature collection with 7201 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 82678 ymin: 5343 xmax: 655604.7 ymax: 657534.1
Projected CRS: OSGB 1936 / British National Grid
# read in countries boundary shapefile (for clipping polygon grid)
country_bound <- st_read("data/Countries_bound/Countries_(December_2017)_Boundaries.shp") %>%
filter(ctry17nm == "England" | ctry17nm == "Wales")
Reading layer `Countries_(December_2017)_Boundaries' from data source `C:\Users\chris\Desktop\Data Analysis Projects\green_space\data\Countries_bound\Countries_(December_2017)_Boundaries.shp' using driver `ESRI Shapefile'
Simple feature collection with 3 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 5512.999 ymin: 5351.297 xmax: 655644.8 ymax: 1220302
Projected CRS: OSGB 1936 / British National Grid
# join garden data with
people_wo_gar_spatial <- msoa_bound %>%
left_join(people_wo_gar_EW) %>%
# median imputation for one msoa with missing
mutate(people_wo_gar = replace_na(people_wo_gar, median_msoa_peop_wo_gar))
Joining, by = c("msoa_code", "msoa_name")
# ###########################
# simulate the locations of people without a garden
# ###########################
# random locations withi an MSOA was too computationally intesnive
# so I went with placing the 'people' at MSOA centroids
people <- people_wo_gar_spatial %>%
mutate(people_wo_gar = round(people_wo_gar / 10),
geometry = st_centroid(geometry)) %>%
uncount(people_wo_gar)
sum(people_wo_gar_spatial$people_wo_gar)
[1] 6702350
# create boundary file
map_bound <- people_wo_gar_spatial %>%
summarise()# %>%
#st_transform(4326)
# people <- st_sample(
# select(people_wo_gar_spatial, -people_wo_gar),
# size = round(people_wo_gar_spatial$people_wo_gar / 100))
# ###########################
# create hexagonal grid
# ###########################
hex_polygons <- st_make_grid(map_bound, 7500,
crs = st_crs(people_wo_gar_spatial),
what = "polygons",
square = FALSE) %>%
st_sf()
# calculate number of people in each polygon
intersects <- st_intersects(hex_polygons, people)
hex_polygons$people_wo_gar <- lengths(intersects)
# ###########################
# create the plot
# ###########################
# ggplot(people_wo_gar_spatial) +
# #geom_sf(data = map_bound, colour = "black", size = 1) +
# geom_sf(data = hex_polygons, fill = "black") + # , aes(fill = people_wo_gar)
# scale_fill_viridis_c(trans = "log", direction = -1) +
# #scale_color_viridis_c(trans = "log") +
# ggthemes::theme_map()
# crop hexogens to outline of England and Wales
hex_polygons_EW <- hex_polygons[country_bound, ]
# hex_polygon_EW %>%
# count(people_wo_gar)
ggplot() +
#geom_sf(data = map_bound) +
#geom_sf(data = st_centroid(msoa_bound), colour = "grey") +
geom_sf(data = hex_polygons_EW, alpha = 0.5, mapping = aes(fill = people_wo_gar)) +
scale_fill_viridis_c(trans = "log", direction = -1) +
theme_void()
Warning: Transformation introduced infinite values in discrete y-axis
# hex_polygons %>%
# filter(people_wo_gar == 0)
#
# people_wo_gar_spatial %>%
# filter(people_wo_gar == 0)
# ###########################
# Smooth
# ###########################
https://pudding.cool/process/regional_smoothing/
Looking at change in park usage (in two periods in 2020) vs garden access.
Usage data from Google via ONS
ggplot(gar_park_use, aes(prop_ad_wo_gar, perc_ch_parks_spring_lock,
colour = is_urban_LAD)) +
geom_point() +
geom_smooth(method = "lm") +
#scale_x_log10() +
xlim(c(0,0.5)) +
facet_wrap(~is_urban_LAD)
`geom_smooth()` using formula 'y ~ x'
Warning: Removed 5 rows containing non-finite values (stat_smooth).
Warning: Removed 5 rows containing missing values (geom_point).
p4 <- ggplot(filter(gar_park_use_foe_eng, prop_urban > 0.8),
aes(perc_ch_parks_spring_lock, perc_ch_park_summer,
colour = prop_ad_wo_gar,
size = green_space_area_per_capita,
label = lad_name)) +
geom_point(alpha = 0.5)+
scale_colour_viridis_c(direction = -1, trans = "log", end = 0.9)
p4
Warning: Removed 7 rows containing missing values (geom_point).
plotly::ggplotly(p4)
Major cities
p
Warning: Using size for a discrete variable is not advised.
Warning: Removed 1 rows containing missing values (geom_point).
LA_bounds
Simple feature collection with 382 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -116.1928 ymin: 5337.901 xmax: 655653.8 ymax: 1220302
Projected CRS: OSGB 1936 / British National Grid
First 10 features:
objectid lad19cd lad19nm lad19nmw bng_e bng_n long lat st_areasha
1 1 E06000001 Hartlepool <NA> 447160 531474 -1.27018 54.67614 93712620
2 2 E06000002 Middlesbrough <NA> 451141 516887 -1.21099 54.54467 53881564
3 3 E06000003 Redcar and Cleveland <NA> 464361 519597 -1.00608 54.56752 245069509
4 4 E06000004 Stockton-on-Tees <NA> 444940 518183 -1.30664 54.55691 204932954
5 5 E06000005 Darlington <NA> 428029 515648 -1.56835 54.53534 197475689
6 6 E06000006 Halton <NA> 354246 382146 -2.68853 53.33424 79084035
7 7 E06000007 Warrington <NA> 362744 388456 -2.56167 53.39163 180627984
8 8 E06000008 Blackburn with Darwen <NA> 369490 422806 -2.46360 53.70080 137022080
9 9 E06000009 Blackpool <NA> 332819 436635 -3.02199 53.82164 34870886
10 10 E06000010 Kingston upon Hull, City of <NA> 511894 431650 -0.30382 53.76920 71583612
st_lengths geometry
1 71011.93 MULTIPOLYGON (((447213.9 53...
2 44481.69 MULTIPOLYGON (((448609.9 52...
3 96703.99 MULTIPOLYGON (((455932.3 52...
4 123408.99 MULTIPOLYGON (((444157 5279...
5 107206.40 MULTIPOLYGON (((423496.6 52...
6 77771.10 MULTIPOLYGON (((358374.7 38...
7 114690.86 MULTIPOLYGON (((367308.2 39...
8 65284.97 MULTIPOLYGON (((369226.3 43...
9 34483.49 MULTIPOLYGON (((332985.7 44...
10 64681.12 MULTIPOLYGON (((510966.6 43...